{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# LiteLLM - Benchmark Llama2, Claude1.2 and GPT3.5 for a use case\n",
        "In this notebook for a given use case we run the same question and view:\n",
        "* LLM Response\n",
        "* Response Time\n",
        "* Response Cost\n",
        "\n",
        "## Sample output for a question\n",
        "![Screenshot 2023-09-07 at 4.45.37 PM.png]()"
      ],
      "metadata": {
        "id": "4Cq-_Y-TKf0r"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "O3ENsWYB27Mb"
      },
      "outputs": [],
      "source": [
        "!pip install litellm"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Example Use Case 1 - Code Generator\n",
        "###  For this use case enter your system prompt and questions\n"
      ],
      "metadata": {
        "id": "Pk55Mjq_3DiR"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# enter your system prompt if you have one\n",
        "system_prompt = \"\"\"\n",
        "You are a coding assistant helping users using litellm.\n",
        "litellm is a light package to simplify calling OpenAI, Azure, Cohere, Anthropic, Huggingface API Endpoints\n",
        "--\n",
        "Sample Usage:\n",
        "```\n",
        "pip install litellm\n",
        "from litellm import completion\n",
        "## set ENV variables\n",
        "os.environ[\"OPENAI_API_KEY\"] = \"openai key\"\n",
        "os.environ[\"COHERE_API_KEY\"] = \"cohere key\"\n",
        "messages = [{ \"content\": \"Hello, how are you?\",\"role\": \"user\"}]\n",
        "# openai call\n",
        "response = completion(model=\"gpt-3.5-turbo\", messages=messages)\n",
        "# cohere call\n",
        "response = completion(\"command-nightly\", messages)\n",
        "```\n",
        "\n",
        "\"\"\"\n",
        "\n",
        "\n",
        "# qustions/logs you want to run the LLM on\n",
        "questions = [\n",
        "    \"what is litellm?\",\n",
        "    \"why should I use LiteLLM\",\n",
        "    \"does litellm support Anthropic LLMs\",\n",
        "    \"write code to make a litellm completion call\",\n",
        "]"
      ],
      "metadata": {
        "id": "_1SZYJFB3HmQ"
      },
      "execution_count": 21,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Running questions\n",
        "### Select from 100+ LLMs here: https://docs.litellm.ai/docs/providers"
      ],
      "metadata": {
        "id": "AHH3cqeU3_ZT"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import litellm\n",
        "from litellm import completion, completion_cost\n",
        "import os\n",
        "import time\n",
        "\n",
        "# optional use litellm dashboard to view logs\n",
        "# litellm.use_client = True\n",
        "# litellm.token = \"ishaan_2@berri.ai\" # set your email\n",
        "\n",
        "\n",
        "# set API keys\n",
        "os.environ['TOGETHERAI_API_KEY'] = \"\"\n",
        "os.environ['OPENAI_API_KEY'] = \"\"\n",
        "os.environ['ANTHROPIC_API_KEY'] = \"\"\n",
        "\n",
        "\n",
        "# select LLMs to benchmark\n",
        "# using https://api.together.xyz/playground for llama2\n",
        "# try any supported LLM here: https://docs.litellm.ai/docs/providers\n",
        "\n",
        "models = ['togethercomputer/llama-2-70b-chat', 'gpt-3.5-turbo', 'claude-instant-1.2']\n",
        "data = []\n",
        "\n",
        "for question in questions: # group by question\n",
        "  for model in models:\n",
        "    print(f\"running question: {question} for model: {model}\")\n",
        "    start_time = time.time()\n",
        "    # show response, response time, cost for each question\n",
        "    response = completion(\n",
        "        model=model,\n",
        "        max_tokens=500,\n",
        "        messages = [\n",
        "            {\n",
        "              \"role\": \"system\", \"content\": system_prompt\n",
        "            },\n",
        "            {\n",
        "              \"role\": \"user\", \"content\": question\n",
        "            }\n",
        "        ],\n",
        "    )\n",
        "    end = time.time()\n",
        "    total_time = end-start_time # response time\n",
        "    # print(response)\n",
        "    cost = completion_cost(response) # cost for completion\n",
        "    raw_response = response['choices'][0]['message']['content'] # response string\n",
        "\n",
        "\n",
        "    # add log to pandas df\n",
        "    data.append(\n",
        "        {\n",
        "            'Model': model,\n",
        "            'Question': question,\n",
        "            'Response': raw_response,\n",
        "            'ResponseTime': total_time,\n",
        "            'Cost': cost\n",
        "        })"
      ],
      "metadata": {
        "id": "BpQD4A5339L3"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## View Benchmarks for LLMs"
      ],
      "metadata": {
        "id": "apOSV3PBLa5Y"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from IPython.display import display\n",
        "from IPython.core.interactiveshell import InteractiveShell\n",
        "InteractiveShell.ast_node_interactivity = \"all\"\n",
        "from IPython.display import HTML\n",
        "import pandas as pd\n",
        "\n",
        "df = pd.DataFrame(data)\n",
        "grouped_by_question = df.groupby('Question')\n",
        "\n",
        "for question, group_data in grouped_by_question:\n",
        "    print(f\"Question: {question}\")\n",
        "    HTML(group_data.to_html())\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "CJqBlqUh_8Ws",
        "outputId": "e02c3427-d8c6-4614-ff07-6aab64247ff6"
      },
      "execution_count": 22,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Question: does litellm support Anthropic LLMs\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ],
            "text/html": [
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Model</th>\n",
              "      <th>Question</th>\n",
              "      <th>Response</th>\n",
              "      <th>ResponseTime</th>\n",
              "      <th>Cost</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>togethercomputer/llama-2-70b-chat</td>\n",
              "      <td>does litellm support Anthropic LLMs</td>\n",
              "      <td>Yes, litellm supports Anthropic LLMs.\\n\\nIn the example usage you provided, the `completion` function is called with the `model` parameter set to `\"gpt-3.5-turbo\"` for OpenAI and `\"command-nightly\"` for Cohere.\\n\\nTo use an Anthropic LLM with litellm, you would set the `model` parameter to the name of the Anthropic model you want to use, followed by the version number, if applicable. For example:\\n```\\nresponse = completion(model=\"anthropic-gpt-2\", messages=messages)\\n```\\nThis would call the Anthropic GPT-2 model to generate a completion for the given input messages.\\n\\nNote that you will need to set the `ANTHROPIC_API_KEY` environment variable to your Anthropic API key before making the call. You can do this by running the following command in your terminal:\\n```\\nos.environ[\"ANTHROPIC_API_KEY\"] = \"your-anthropic-api-key\"\\n```\\nReplace `\"your-anthropic-api-key\"` with your actual Anthropic API key.\\n\\nOnce you've set the environment variable, you can use the `completion` function with the `model` parameter set to an Anthropic model name to call the Anthropic API and generate a completion.</td>\n",
              "      <td>21.513009</td>\n",
              "      <td>0.001347</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>gpt-3.5-turbo</td>\n",
              "      <td>does litellm support Anthropic LLMs</td>\n",
              "      <td>No, currently litellm does not support Anthropic LLMs. It mainly focuses on simplifying the usage of OpenAI, Azure, Cohere, and Huggingface API endpoints.</td>\n",
              "      <td>8.656510</td>\n",
              "      <td>0.000342</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8</th>\n",
              "      <td>claude-instant-1.2</td>\n",
              "      <td>does litellm support Anthropic LLMs</td>\n",
              "      <td>Yes, litellm supports calling Anthropic LLMs through the completion function.\\n\\nTo use an Anthropic model with litellm:\\n\\n1. Set the ANTHROPIC_API_KEY environment variable with your Anthropic API key\\n\\n2. Pass the model name as the 'model' argument to completion(). Anthropic model names follow the format 'anthropic/&lt;model_name&gt;'\\n\\nFor example:\\n\\n```python \\nimport os\\nfrom litellm import completion\\n\\nos.environ[\"ANTHROPIC_API_KEY\"] = \"your_anthropic_api_key\"\\n\\nmessages = [{\"content\": \"Hello\", \"role\": \"user\"}]\\n\\nresponse = completion(model=\"anthropic/constitutional\", messages=messages)\\n```\\n\\nThis would call the Constitutional AI model from Anthropic.\\n\\nSo in summary, litellm provides a simple interface to call any Anthropic models as long as you specify the model name correctly and set the ANTHROPIC_API_KEY env variable.</td>\n",
              "      <td>9.698195</td>\n",
              "      <td>0.001342</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>"
            ]
          },
          "metadata": {},
          "execution_count": 22
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Question: what is litellm?\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ],
            "text/html": [
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Model</th>\n",
              "      <th>Question</th>\n",
              "      <th>Response</th>\n",
              "      <th>ResponseTime</th>\n",
              "      <th>Cost</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>togethercomputer/llama-2-70b-chat</td>\n",
              "      <td>what is litellm?</td>\n",
              "      <td>Litellm is a lightweight Python package that simplifies calling various AI API endpoints, including OpenAI, Azure, Cohere, Anthropic, and Hugging Face. It provides a convenient interface for making requests to these APIs, allowing developers to easily integrate them into their applications. With Litellm, developers can quickly and easily interact with multiple AI models and services, without having to handle the details of authentication, API calls, and response parsing. This makes it easier to build and deploy AI-powered applications, and can help developers save time and effort.</td>\n",
              "      <td>13.479644</td>\n",
              "      <td>0.000870</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>gpt-3.5-turbo</td>\n",
              "      <td>what is litellm?</td>\n",
              "      <td>litellm is a light package that provides a simplified interface for making API calls to various language models and APIs. It abstracts away the complexities of handling network requests, authentication, and response parsing, making it easier for developers to integrate powerful language models into their applications.\\n\\nWith litellm, you can quickly make API calls to models like OpenAI's GPT-3.5 Turbo, Azure's Text Analytics, Cohere's Command API, Anthropic's API, and Huggingface's models. It also supports additional functionality like conversational AI, summarization, translation, and more.\\n\\nBy using litellm, you can focus on your application logic without getting tangled in the details of API integration, allowing you to quickly build intelligent and conversational applications.</td>\n",
              "      <td>8.324332</td>\n",
              "      <td>0.000566</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>claude-instant-1.2</td>\n",
              "      <td>what is litellm?</td>\n",
              "      <td>litellm is a Python library that simplifies calling various AI API endpoints like OpenAI, Azure, Cohere, Anthropic, and Huggingface. \\n\\nSome key things to know about litellm:\\n\\n- It provides a consistent interface for completing prompts and generating responses from different AI models through a single method called completion().\\n\\n- You specify the API (e.g. OpenAI, Cohere etc.) and model either by name or by setting environment variables before making the completion call.\\n\\n- This avoids having to use different SDKs or APIs for each provider and standardizes the call structure. \\n\\n- It handles things like setting headers, encoding inputs, parsing responses so the user doesn't have to deal with those details.\\n\\n- The goal is to make it easy to try different AI APIs and models without having to change code or learn different interfaces.\\n\\n- It's lightweight with no other dependencies required besides what's needed for each API (e.g. openai, azure SDKs etc.).\\n\\nSo in summary, litellm is a small library that provides a common way to interact with multiple conversational AI APIs through a single Python method, avoiding the need to directly use each provider's specific SDK.</td>\n",
              "      <td>10.316488</td>\n",
              "      <td>0.001603</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>"
            ]
          },
          "metadata": {},
          "execution_count": 22
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Question: why should I use LiteLLM\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ],
            "text/html": [
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Model</th>\n",
              "      <th>Question</th>\n",
              "      <th>Response</th>\n",
              "      <th>ResponseTime</th>\n",
              "      <th>Cost</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>togethercomputer/llama-2-70b-chat</td>\n",
              "      <td>why should I use LiteLLM</td>\n",
              "      <td>\\nThere are several reasons why you might want to use LiteLLM:\\n\\n1. Simplified API calls: LiteLLM provides a simple and consistent API for calling various language models, making it easier to use multiple models and switch between them.\\n2. Environment variable configuration: LiteLLM allows you to set environment variables for API keys and model names, making it easier to manage and switch between different models and APIs.\\n3. Support for multiple models and APIs: LiteLLM supports a wide range of language models and APIs, including OpenAI, Azure, Cohere, Anthropic, and Hugging Face.\\n4. Easy integration with popular frameworks: LiteLLM can be easily integrated with popular frameworks such as PyTorch and TensorFlow, making it easy to use with your existing codebase.\\n5. Lightweight: LiteLLM is a lightweight package, making it easy to install and use, even on resource-constrained devices.\\n6. Flexible: LiteLLM allows you to define your own models and APIs, making it easy to use with custom models and APIs.\\n7. Extensive documentation: LiteLLM has extensive documentation, making it easy to get started and learn how to use the package.\\n8. Active community: LiteLLM has an active community of developers and users, making it easy to get help and feedback on your projects.\\n\\nOverall, LiteLLM can help you to simplify your workflow, improve your productivity, and make it easier to work with multiple language models and APIs.</td>\n",
              "      <td>23.777885</td>\n",
              "      <td>0.001443</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>gpt-3.5-turbo</td>\n",
              "      <td>why should I use LiteLLM</td>\n",
              "      <td>LiteLLM is a lightweight Python package that simplifies the process of making API calls to various language models. Here are some reasons why you should use LiteLLM:\\n\\n1. **Simplified API Calls**: LiteLLM abstracts away the complexity of making API calls to different language models. It provides a unified interface for invoking models from OpenAI, Azure, Cohere, Anthropic, Huggingface, and more.\\n\\n2. **Easy Integration**: LiteLLM seamlessly integrates with your existing codebase. You can import the package and start making API calls with just a few lines of code.\\n\\n3. **Flexibility**: LiteLLM supports a variety of language models, including GPT-3, GPT-Neo, chatGPT, and more. You can choose the model that suits your requirements and easily switch between them.\\n\\n4. **Convenience**: LiteLLM handles the authentication and connection details for you. You just need to set the relevant environment variables, and the package takes care of the rest.\\n\\n5. **Quick Prototyping**: LiteLLM is ideal for rapid prototyping and experimentation. With its simple API, you can quickly generate text, chat with models, and build interactive applications.\\n\\n6. **Community Support**: LiteLLM is actively maintained and supported by a community of developers. You can find help, share ideas, and collaborate with others to enhance your projects.\\n\\nOverall, LiteLLM simplifies the process of making API calls to language models, saving you time and effort while providing flexibility and convenience.</td>\n",
              "      <td>12.109881</td>\n",
              "      <td>0.000881</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>claude-instant-1.2</td>\n",
              "      <td>why should I use LiteLLM</td>\n",
              "      <td>Here are some key reasons why you may want to consider using LiteLLM:\\n\\n- Simplifies calling multiple large language models - LiteLLM provides a unified API to call models from different providers like OpenAI, Azure, HuggingFace, Anthropic etc. This avoids having to deal with different APIs from each provider.\\n\\n- Easy to get started - LiteLLM is very lightweight and simple to install with just one pip install. It has minimal configuration needed with just API keys. This lowers the barrier to start experimenting with LLMs.\\n\\n- Standard interface - The completion API provides a standard way to send messages and get responses back regardless of underlying model. This keeps the code interface consistent. \\n\\n- Model agnostic - Your code doesn't need to change to switch between different models. You can easily compare performance of models from different providers.\\n\\n- Open source - As it is open source, the library can be expanded to support more models and providers over time via community contributions.\\n\\n- Cost effective - LiteLLM helps avoid vendor lock-in. You can optimize costs by testing multiple models and choosing the best suitable and cost-effective option for your needs.\\n\\nSo in summary, L</td>\n",
              "      <td>12.610259</td>\n",
              "      <td>0.001568</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>"
            ]
          },
          "metadata": {},
          "execution_count": 22
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Question: write code to make a litellm completion call\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ],
            "text/html": [
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Model</th>\n",
              "      <th>Question</th>\n",
              "      <th>Response</th>\n",
              "      <th>ResponseTime</th>\n",
              "      <th>Cost</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>9</th>\n",
              "      <td>togethercomputer/llama-2-70b-chat</td>\n",
              "      <td>write code to make a litellm completion call</td>\n",
              "      <td>\\nTo make a completion call using litellm, you can use the `completion` function from the litellm package. Here's an example of how to use it:\\n```\\nfrom litellm import completion\\n\\n# Set ENV variables\\nos.environ[\"OPENAI_API_KEY\"] = \"your_openai_api_key\"\\nos.environ[\"COHERE_API_KEY\"] = \"your_cohere_api_key\"\\n\\n# Define the messages to be completed\\nmessages = [\\n    {\\n        \"content\": \"Hello, how are you?\",\\n        \"role\": \"user\"\\n    }\\n]\\n\\n# Make a completion call using OpenAI\\nresponse = completion(model=\"gpt-3.5-turbo\", messages=messages)\\n\\n# Make a completion call using Cohere\\nresponse = completion(\"command-nightly\", messages)\\n```\\nIn this example, we first set the ENV variables for the OpenAI and Cohere API keys. Then, we define a list of messages to be completed, which in this case contains a single message with the content \"Hello, how are you?\" and the role \"user\".\\n\\nNext, we make two completion calls using the `completion` function from litellm. The first call uses the OpenAI model `gpt-3.5-turbo` and passes in the list of messages. The second call uses the Cohere model `command-nightly` and passes in the same list of messages.\\n\\nThe `completion` function returns a response object that contains the completed messages. You can then use the `response.messages` attribute to access the completed messages.\\n\\nHere's an example of how to access the completed messages:\\n```\\n# Print the completed messages\\nprint(response.messages)\\n```\\nThis will print the completed messages, which should look something like this:\\n```\\n[{'content': 'Hello, how are you?', 'role': 'user', 'completion': 'I am doing well, thanks for asking.'}]\\n```\\nNote that the `completion` attribute contains the completed message. You can use this attribute to retrieve the completed message and use it in your application.</td>\n",
              "      <td>32.192453</td>\n",
              "      <td>0.001821</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>10</th>\n",
              "      <td>gpt-3.5-turbo</td>\n",
              "      <td>write code to make a litellm completion call</td>\n",
              "      <td>Sure! Here's an example code to make a completion call using the litellm package:\\n\\n```python\\nfrom litellm import completion\\n\\n# Set the model and messages\\nmodel = \"gpt-3.5-turbo\"\\nmessages = [\\n    {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\\n    {\"role\": \"user\", \"content\": \"Who won the world series in 2020?\"},\\n    {\"role\": \"assistant\", \"content\": \"The Los Angeles Dodgers won the World Series in 2020.\"},\\n    {\"role\": \"user\", \"content\": \"Where was it played?\"}\\n]\\n\\n# Make the completion call\\nresponse = completion(model=model, messages=messages)\\n\\n# Print the assistant's reply\\nassistant_reply = response[\"choices\"][0][\"message\"][\"content\"]\\nprint(\"Assistant: \", assistant_reply)\\n```\\n\\nMake sure you have the litellm package installed (`pip install litellm`) and set the necessary environment variables for the API keys before running this code.</td>\n",
              "      <td>9.377155</td>\n",
              "      <td>0.000686</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>11</th>\n",
              "      <td>claude-instant-1.2</td>\n",
              "      <td>write code to make a litellm completion call</td>\n",
              "      <td>Here is an example of making a completion call using litellm:\\n\\n```python\\nimport os\\nfrom litellm import completion\\n\\n# Set API keys as environment variables\\nos.environ[\"OPENAI_API_KEY\"] = \"your openai api key\" \\n\\n# Conversation context \\nmessages = [{\\n  \"content\": \"Hello, how can I help you today?\",\\n  \"role\": \"assistant\"\\n}]\\n\\n# Make completion call with GPT-3 model\\nresponse = completion(\\n  model=\"gpt-3.5-turbo\", \\n  messages=messages\\n)\\n\\nprint(response)\\n```\\n\\nTo break it down:\\n\\n- Import completion from litellm\\n- Set the OPENAI_API_KEY env var \\n- Define a messages list with the conversation context\\n- Call completion(), specifying the model (\"gpt-3.5-turbo\") and messages\\n- It will return the response from the API\\n- Print the response\\n\\nThis makes a simple completion call to OpenAI GPT-3 using litellm to handle the API details. You can also call other models like Cohere or Anthropic by specifying their name instead of the OpenAI</td>\n",
              "      <td>9.839988</td>\n",
              "      <td>0.001578</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>"
            ]
          },
          "metadata": {},
          "execution_count": 22
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Use Case 2 - Rewrite user input concisely"
      ],
      "metadata": {
        "id": "bmtAbC1rGVAm"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# enter your system prompt if you have one\n",
        "system_prompt = \"\"\"\n",
        "For a given user input, rewrite the input to make be more concise.\n",
        "\"\"\"\n",
        "\n",
        "# user input for re-writing questions\n",
        "questions = [\n",
        "    \"LiteLLM is a lightweight Python package that simplifies the process of making API calls to various language models. Here are some reasons why you should use LiteLLM:\\n\\n1. **Simplified API Calls**: LiteLLM abstracts away the complexity of making API calls to different language models. It provides a unified interface for invoking models from OpenAI, Azure, Cohere, Anthropic, Huggingface, and more.\\n\\n2. **Easy Integration**: LiteLLM seamlessly integrates with your existing codebase. You can import the package and start making API calls with just a few lines of code.\\n\\n3. **Flexibility**: LiteLLM supports a variety of language models, including GPT-3, GPT-Neo, chatGPT, and more. You can choose the model that suits your requirements and easily switch between them.\\n\\n4. **Convenience**: LiteLLM handles the authentication and connection details for you. You just need to set the relevant environment variables, and the package takes care of the rest.\\n\\n5. **Quick Prototyping**: LiteLLM is ideal for rapid prototyping and experimentation. With its simple API, you can quickly generate text, chat with models, and build interactive applications.\\n\\n6. **Community Support**: LiteLLM is actively maintained and supported by a community of developers. You can find help, share ideas, and collaborate with others to enhance your projects.\\n\\nOverall, LiteLLM simplifies the process of making API calls to language models, saving you time and effort while providing flexibility and convenience\",\n",
        "    \"Hi everyone! I'm [your name] and I'm currently working on [your project/role involving LLMs]. I came across LiteLLM and was really excited by how it simplifies working with different LLM providers. I'm hoping to use LiteLLM to [build an app/simplify my code/test different models etc]. Before finding LiteLLM, I was struggling with [describe any issues you faced working with multiple LLMs]. With LiteLLM's unified API and automatic translation between providers, I think it will really help me to [goals you have for using LiteLLM]. Looking forward to being part of this community and learning more about how I can build impactful applications powered by LLMs!Let me know if you would like me to modify or expand on any part of this suggested intro. I'm happy to provide any clarification or additional details you need!\",\n",
        "    \"Traceloop is a platform for monitoring and debugging the quality of your LLM outputs. It provides you with a way to track the performance of your LLM application; rollout changes with confidence; and debug issues in production. It is based on OpenTelemetry, so it can provide full visibility to your LLM requests, as well vector DB usage, and other infra in your stack.\"\n",
        "]"
      ],
      "metadata": {
        "id": "boiHO1PhGXSL"
      },
      "execution_count": 23,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Run Questions"
      ],
      "metadata": {
        "id": "fwNcC_obICUc"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import litellm\n",
        "from litellm import completion, completion_cost\n",
        "import os\n",
        "import time\n",
        "\n",
        "# optional use litellm dashboard to view logs\n",
        "# litellm.use_client = True\n",
        "# litellm.token = \"ishaan_2@berri.ai\" # set your email\n",
        "\n",
        "os.environ['TOGETHERAI_API_KEY'] = \"\"\n",
        "os.environ['OPENAI_API_KEY'] = \"\"\n",
        "os.environ['ANTHROPIC_API_KEY'] = \"\"\n",
        "\n",
        "models = ['togethercomputer/llama-2-70b-chat', 'gpt-3.5-turbo', 'claude-instant-1.2'] # enter llms to benchmark\n",
        "data_2 = []\n",
        "\n",
        "for question in questions: # group by question\n",
        "  for model in models:\n",
        "    print(f\"running question: {question} for model: {model}\")\n",
        "    start_time = time.time()\n",
        "    # show response, response time, cost for each question\n",
        "    response = completion(\n",
        "        model=model,\n",
        "        max_tokens=500,\n",
        "        messages = [\n",
        "            {\n",
        "              \"role\": \"system\", \"content\": system_prompt\n",
        "            },\n",
        "            {\n",
        "              \"role\": \"user\", \"content\": \"User input:\" + question\n",
        "            }\n",
        "        ],\n",
        "    )\n",
        "    end = time.time()\n",
        "    total_time = end-start_time # response time\n",
        "    # print(response)\n",
        "    cost = completion_cost(response) # cost for completion\n",
        "    raw_response = response['choices'][0]['message']['content'] # response string\n",
        "    #print(raw_response, total_time, cost)\n",
        "\n",
        "    # add to pandas df\n",
        "    data_2.append(\n",
        "        {\n",
        "            'Model': model,\n",
        "            'Question': question,\n",
        "            'Response': raw_response,\n",
        "            'ResponseTime': total_time,\n",
        "            'Cost': cost\n",
        "        })\n",
        "\n",
        "\n"
      ],
      "metadata": {
        "id": "KtBjZ1mUIBiJ"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## View Logs - Group by Question"
      ],
      "metadata": {
        "id": "-PCYIzG5M0II"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from IPython.display import display\n",
        "from IPython.core.interactiveshell import InteractiveShell\n",
        "InteractiveShell.ast_node_interactivity = \"all\"\n",
        "from IPython.display import HTML\n",
        "import pandas as pd\n",
        "\n",
        "df = pd.DataFrame(data_2)\n",
        "grouped_by_question = df.groupby('Question')\n",
        "\n",
        "for question, group_data in grouped_by_question:\n",
        "    print(f\"Question: {question}\")\n",
        "    HTML(group_data.to_html())\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "-3R5-2q8IiL2",
        "outputId": "c4a0d9e5-bb21-4de0-fc4c-9f5e71d0f177"
      },
      "execution_count": 20,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Question: Hi everyone! I'm [your name] and I'm currently working on [your project/role involving LLMs]. I came across LiteLLM and was really excited by how it simplifies working with different LLM providers. I'm hoping to use LiteLLM to [build an app/simplify my code/test different models etc]. Before finding LiteLLM, I was struggling with [describe any issues you faced working with multiple LLMs]. With LiteLLM's unified API and automatic translation between providers, I think it will really help me to [goals you have for using LiteLLM]. Looking forward to being part of this community and learning more about how I can build impactful applications powered by LLMs!Let me know if you would like me to modify or expand on any part of this suggested intro. I'm happy to provide any clarification or additional details you need!\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ],
            "text/html": [
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Model</th>\n",
              "      <th>Question</th>\n",
              "      <th>Response</th>\n",
              "      <th>ResponseTime</th>\n",
              "      <th>Cost</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>togethercomputer/llama-2-70b-chat</td>\n",
              "      <td>Hi everyone! I'm [your name] and I'm currently working on [your project/role involving LLMs]. I came across LiteLLM and was really excited by how it simplifies working with different LLM providers. I'm hoping to use LiteLLM to [build an app/simplify my code/test different models etc]. Before finding LiteLLM, I was struggling with [describe any issues you faced working with multiple LLMs]. With LiteLLM's unified API and automatic translation between providers, I think it will really help me to [goals you have for using LiteLLM]. Looking forward to being part of this community and learning more about how I can build impactful applications powered by LLMs!Let me know if you would like me to modify or expand on any part of this suggested intro. I'm happy to provide any clarification or additional details you need!</td>\n",
              "      <td>\\nHere's a more concise version of the user input:\\n\\n\"Hi everyone! I'm [your name] and I'm working on [your project/role involving LLMs]. I recently discovered LiteLLM and I'm excited to use it to [build an app/simplify my code/test different models etc]. Before LiteLLM, I struggled with [describe any issues you faced working with multiple LLMs]. I'm looking forward to using LiteLLM's unified API and automatic translation to achieve my goals. I'm eager to learn more about building impactful applications powered by LLMs and to be part of this community. Let me know if you have any questions or need further clarification.\"\\n\\nIn this revised version, we've kept the essential information and removed some of the extraneous language. We've also rephrased some of the sentences to make them more concise and easier to read.</td>\n",
              "      <td>18.300620</td>\n",
              "      <td>0.001200</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>gpt-3.5-turbo</td>\n",
              "      <td>Hi everyone! I'm [your name] and I'm currently working on [your project/role involving LLMs]. I came across LiteLLM and was really excited by how it simplifies working with different LLM providers. I'm hoping to use LiteLLM to [build an app/simplify my code/test different models etc]. Before finding LiteLLM, I was struggling with [describe any issues you faced working with multiple LLMs]. With LiteLLM's unified API and automatic translation between providers, I think it will really help me to [goals you have for using LiteLLM]. Looking forward to being part of this community and learning more about how I can build impactful applications powered by LLMs!Let me know if you would like me to modify or expand on any part of this suggested intro. I'm happy to provide any clarification or additional details you need!</td>\n",
              "      <td>User input: Hi, I'm [your name] and I'm excited about using LiteLLM to simplify working with different LLM providers. Before finding LiteLLM, I faced challenges working with multiple LLMs. With LiteLLM's unified API and automatic translation, I believe it will help me achieve my goals of [state your goals]. I look forward to being part of this community and learning how to build impactful applications with LLMs. Let me know if you need any further clarification or details.</td>\n",
              "      <td>7.385472</td>\n",
              "      <td>0.000525</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>claude-instant-1.2</td>\n",
              "      <td>Hi everyone! I'm [your name] and I'm currently working on [your project/role involving LLMs]. I came across LiteLLM and was really excited by how it simplifies working with different LLM providers. I'm hoping to use LiteLLM to [build an app/simplify my code/test different models etc]. Before finding LiteLLM, I was struggling with [describe any issues you faced working with multiple LLMs]. With LiteLLM's unified API and automatic translation between providers, I think it will really help me to [goals you have for using LiteLLM]. Looking forward to being part of this community and learning more about how I can build impactful applications powered by LLMs!Let me know if you would like me to modify or expand on any part of this suggested intro. I'm happy to provide any clarification or additional details you need!</td>\n",
              "      <td>Here is a more concise rewrite of the user input:\\n\\nHi everyone, I'm [your name]. I'm currently [your project/role] and came across LiteLLM, which simplifies working with different LLMs through its unified API. I hope to [build an app/simplify code/test models] with LiteLLM since I previously struggled with [issues]. LiteLLM's automatic translation between providers will help me [goals] and build impactful LLM applications. Looking forward to learning more as part of this community. Let me know if you need any clarification on my plans to use LiteLLM.</td>\n",
              "      <td>8.628217</td>\n",
              "      <td>0.001022</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>"
            ]
          },
          "metadata": {},
          "execution_count": 20
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Question: LiteLLM is a lightweight Python package that simplifies the process of making API calls to various language models. Here are some reasons why you should use LiteLLM:\n",
            "\n",
            "1. **Simplified API Calls**: LiteLLM abstracts away the complexity of making API calls to different language models. It provides a unified interface for invoking models from OpenAI, Azure, Cohere, Anthropic, Huggingface, and more.\n",
            "\n",
            "2. **Easy Integration**: LiteLLM seamlessly integrates with your existing codebase. You can import the package and start making API calls with just a few lines of code.\n",
            "\n",
            "3. **Flexibility**: LiteLLM supports a variety of language models, including GPT-3, GPT-Neo, chatGPT, and more. You can choose the model that suits your requirements and easily switch between them.\n",
            "\n",
            "4. **Convenience**: LiteLLM handles the authentication and connection details for you. You just need to set the relevant environment variables, and the package takes care of the rest.\n",
            "\n",
            "5. **Quick Prototyping**: LiteLLM is ideal for rapid prototyping and experimentation. With its simple API, you can quickly generate text, chat with models, and build interactive applications.\n",
            "\n",
            "6. **Community Support**: LiteLLM is actively maintained and supported by a community of developers. You can find help, share ideas, and collaborate with others to enhance your projects.\n",
            "\n",
            "Overall, LiteLLM simplifies the process of making API calls to language models, saving you time and effort while providing flexibility and convenience\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ],
            "text/html": [
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Model</th>\n",
              "      <th>Question</th>\n",
              "      <th>Response</th>\n",
              "      <th>ResponseTime</th>\n",
              "      <th>Cost</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>togethercomputer/llama-2-70b-chat</td>\n",
              "      <td>LiteLLM is a lightweight Python package that simplifies the process of making API calls to various language models. Here are some reasons why you should use LiteLLM:\\n\\n1. **Simplified API Calls**: LiteLLM abstracts away the complexity of making API calls to different language models. It provides a unified interface for invoking models from OpenAI, Azure, Cohere, Anthropic, Huggingface, and more.\\n\\n2. **Easy Integration**: LiteLLM seamlessly integrates with your existing codebase. You can import the package and start making API calls with just a few lines of code.\\n\\n3. **Flexibility**: LiteLLM supports a variety of language models, including GPT-3, GPT-Neo, chatGPT, and more. You can choose the model that suits your requirements and easily switch between them.\\n\\n4. **Convenience**: LiteLLM handles the authentication and connection details for you. You just need to set the relevant environment variables, and the package takes care of the rest.\\n\\n5. **Quick Prototyping**: LiteLLM is ideal for rapid prototyping and experimentation. With its simple API, you can quickly generate text, chat with models, and build interactive applications.\\n\\n6. **Community Support**: LiteLLM is actively maintained and supported by a community of developers. You can find help, share ideas, and collaborate with others to enhance your projects.\\n\\nOverall, LiteLLM simplifies the process of making API calls to language models, saving you time and effort while providing flexibility and convenience</td>\n",
              "      <td>Here's a more concise version of the user input:\\n\\nLiteLLM is a lightweight Python package that simplifies API calls to various language models. It abstracts away complexity, integrates seamlessly, supports multiple models, and handles authentication. It's ideal for rapid prototyping and has community support. It saves time and effort while providing flexibility and convenience.</td>\n",
              "      <td>11.294250</td>\n",
              "      <td>0.001251</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>gpt-3.5-turbo</td>\n",
              "      <td>LiteLLM is a lightweight Python package that simplifies the process of making API calls to various language models. Here are some reasons why you should use LiteLLM:\\n\\n1. **Simplified API Calls**: LiteLLM abstracts away the complexity of making API calls to different language models. It provides a unified interface for invoking models from OpenAI, Azure, Cohere, Anthropic, Huggingface, and more.\\n\\n2. **Easy Integration**: LiteLLM seamlessly integrates with your existing codebase. You can import the package and start making API calls with just a few lines of code.\\n\\n3. **Flexibility**: LiteLLM supports a variety of language models, including GPT-3, GPT-Neo, chatGPT, and more. You can choose the model that suits your requirements and easily switch between them.\\n\\n4. **Convenience**: LiteLLM handles the authentication and connection details for you. You just need to set the relevant environment variables, and the package takes care of the rest.\\n\\n5. **Quick Prototyping**: LiteLLM is ideal for rapid prototyping and experimentation. With its simple API, you can quickly generate text, chat with models, and build interactive applications.\\n\\n6. **Community Support**: LiteLLM is actively maintained and supported by a community of developers. You can find help, share ideas, and collaborate with others to enhance your projects.\\n\\nOverall, LiteLLM simplifies the process of making API calls to language models, saving you time and effort while providing flexibility and convenience</td>\n",
              "      <td>LiteLLM is a lightweight Python package that simplifies API calls to various language models. Here's why you should use it:\\n1. Simplified API Calls: Works with multiple models (OpenAI, Azure, Cohere, Anthropic, Huggingface).\\n2. Easy Integration: Import and start using it quickly in your codebase.\\n3. Flexibility: Supports GPT-3, GPT-Neo, chatGPT, etc. easily switch between models.\\n4. Convenience: Handles authentication and connection details, just set environment variables.\\n5. Quick Prototyping: Great for rapid prototyping and building interactive applications.\\n6. Community Support: Actively maintained and supported by a developer community.</td>\n",
              "      <td>9.778315</td>\n",
              "      <td>0.000795</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>claude-instant-1.2</td>\n",
              "      <td>LiteLLM is a lightweight Python package that simplifies the process of making API calls to various language models. Here are some reasons why you should use LiteLLM:\\n\\n1. **Simplified API Calls**: LiteLLM abstracts away the complexity of making API calls to different language models. It provides a unified interface for invoking models from OpenAI, Azure, Cohere, Anthropic, Huggingface, and more.\\n\\n2. **Easy Integration**: LiteLLM seamlessly integrates with your existing codebase. You can import the package and start making API calls with just a few lines of code.\\n\\n3. **Flexibility**: LiteLLM supports a variety of language models, including GPT-3, GPT-Neo, chatGPT, and more. You can choose the model that suits your requirements and easily switch between them.\\n\\n4. **Convenience**: LiteLLM handles the authentication and connection details for you. You just need to set the relevant environment variables, and the package takes care of the rest.\\n\\n5. **Quick Prototyping**: LiteLLM is ideal for rapid prototyping and experimentation. With its simple API, you can quickly generate text, chat with models, and build interactive applications.\\n\\n6. **Community Support**: LiteLLM is actively maintained and supported by a community of developers. You can find help, share ideas, and collaborate with others to enhance your projects.\\n\\nOverall, LiteLLM simplifies the process of making API calls to language models, saving you time and effort while providing flexibility and convenience</td>\n",
              "      <td>Here is a concise rewrite of the user input:\\n\\nLiteLLM is a lightweight Python package that simplifies accessing various language models. It provides a unified interface for models from OpenAI, Azure, Cohere, Anthropic, Huggingface, and more. Key benefits include simplified API calls, easy integration, flexibility to use different models, automated handling of authentication, and support for quick prototyping. The actively maintained package saves time by abstracting away complexity while offering convenience and a collaborative community.</td>\n",
              "      <td>7.697520</td>\n",
              "      <td>0.001098</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>"
            ]
          },
          "metadata": {},
          "execution_count": 20
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Question: Traceloop is a platform for monitoring and debugging the quality of your LLM outputs. It provides you with a way to track the performance of your LLM application; rollout changes with confidence; and debug issues in production. It is based on OpenTelemetry, so it can provide full visibility to your LLM requests, as well vector DB usage, and other infra in your stack.\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ],
            "text/html": [
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Model</th>\n",
              "      <th>Question</th>\n",
              "      <th>Response</th>\n",
              "      <th>ResponseTime</th>\n",
              "      <th>Cost</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>togethercomputer/llama-2-70b-chat</td>\n",
              "      <td>Traceloop is a platform for monitoring and debugging the quality of your LLM outputs. It provides you with a way to track the performance of your LLM application; rollout changes with confidence; and debug issues in production. It is based on OpenTelemetry, so it can provide full visibility to your LLM requests, as well vector DB usage, and other infra in your stack.</td>\n",
              "      <td>\\nRewritten input: Traceloop is a platform for monitoring and debugging LLM outputs. It allows users to track performance, rollout changes confidently, and debug issues in production. It uses OpenTelemetry for full visibility into LLM requests, vector DB usage, and other infrastructure.</td>\n",
              "      <td>9.060444</td>\n",
              "      <td>0.000525</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>gpt-3.5-turbo</td>\n",
              "      <td>Traceloop is a platform for monitoring and debugging the quality of your LLM outputs. It provides you with a way to track the performance of your LLM application; rollout changes with confidence; and debug issues in production. It is based on OpenTelemetry, so it can provide full visibility to your LLM requests, as well vector DB usage, and other infra in your stack.</td>\n",
              "      <td>Traceloop is a platform for monitoring and debugging the quality of your LLM outputs. It helps track performance, rollout changes, and debug issues in production. It is based on OpenTelemetry, providing visibility to LLM requests, vector DB usage, and other infrastructure in your stack.</td>\n",
              "      <td>7.304661</td>\n",
              "      <td>0.000283</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8</th>\n",
              "      <td>claude-instant-1.2</td>\n",
              "      <td>Traceloop is a platform for monitoring and debugging the quality of your LLM outputs. It provides you with a way to track the performance of your LLM application; rollout changes with confidence; and debug issues in production. It is based on OpenTelemetry, so it can provide full visibility to your LLM requests, as well vector DB usage, and other infra in your stack.</td>\n",
              "      <td>Here is a more concise rewrite of the user input:\\n\\nTraceloop monitors and debugs LLM quality. It tracks LLM performance, enables confident changes, and debugs production issues. Based on OpenTelemetry, Traceloop provides full visibility into LLM requests, vector DB usage, and other stack infrastructure.</td>\n",
              "      <td>7.976158</td>\n",
              "      <td>0.000538</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>"
            ]
          },
          "metadata": {},
          "execution_count": 20
        }
      ]
    }
  ]
}